1,607 research outputs found
Privacy Preserving Clustering with Constraints
The k-center problem is a classical combinatorial optimization problem which asks to find k centers such that the maximum distance of any input point in a set P to its assigned center is minimized. The problem allows for elegant 2-approximations. However, the situation becomes significantly more difficult when constraints are added to the problem. We raise the question whether general methods can be derived to turn an approximation algorithm for a clustering problem with some constraints into an approximation algorithm that respects one constraint more. Our constraint of choice is privacy: Here, we are asked to only open a center when at least l clients will be assigned to it. We show how to combine privacy with several other constraints
On the Cost of Essentially Fair Clusterings
Clustering is a fundamental tool in data mining. It partitions points into
groups (clusters) and may be used to make decisions for each point based on its
group. However, this process may harm protected (minority) classes if the
clustering algorithm does not adequately represent them in desirable clusters
-- especially if the data is already biased.
At NIPS 2017, Chierichetti et al. proposed a model for fair clustering
requiring the representation in each cluster to (approximately) preserve the
global fraction of each protected class. Restricting to two protected classes,
they developed both a 4-approximation for the fair -center problem and a
-approximation for the fair -median problem, where is a parameter
for the fairness model. For multiple protected classes, the best known result
is a 14-approximation for fair -center.
We extend and improve the known results. Firstly, we give a 5-approximation
for the fair -center problem with multiple protected classes. Secondly, we
propose a relaxed fairness notion under which we can give bicriteria
constant-factor approximations for all of the classical clustering objectives
-center, -supplier, -median, -means and facility location. The
latter approximations are achieved by a framework that takes an arbitrary
existing unfair (integral) solution and a fair (fractional) LP solution and
combines them into an essentially fair clustering with a weakly supervised
rounding scheme. In this way, a fair clustering can be established belatedly,
in a situation where the centers are already fixed
Wichtige Faktoren des Wintergetreideertrages nach Körnerleguminosen in der Ökolandbaupraxis
With the aim of identifying key factors for cereal yield and mineral nitrogen content, 87 fields of wheat, spelt, rye, triticale and barley on 31 organic farms were evaluated from 2009 to 2011. According to multiple linear regression analysis, factors leading to high cereal yields were high mineral nitrogen content, high soil water storage and high cereal coverage in spring; also deep soil, low weed pressure, high available phosphorous content and low growing frequency in previous years. Key factors for high mineral nitrogen content in spring included low cereal coverage in spring, vegetation cover in winters of previous years, high soil organic matter and silt content, low C/N ratio, low plough depth, legume intercropping before sowing and use of manure or slurry nitrogen. An additional survey year with additional parameters will be conducted before the project’s official end
Faktoren des Körnererbsenertrages in der Ökolandbaupraxis
With the aim of identifying key factors for field pea yield, 52 fields on 21 organic farms were evaluated from 2009 to 2011. Criteria for evaluation were soil variables, crop data, management and field history. The average pea yield of all farms was 2.1 t/ha including 5 fields that did not harvest due to inadequate growth and/or high weed pressure. Based on measurements in three control points per field, yield was 65 % higher when harvesting by hand. The main causes were heterogeneous crops and high yield losses through combine harvesting. According to multiple linear regression analysis of yields in control points, factors leading to high pea yield included deep soil, high available phosphorous content, low clay content and longer intervals between pea crops. Reduction of weed pressure and deep sowing (up to 6 cm) were the main crop management variables effecting yield. An additional survey year with additional parameters will be conducted before the project’s official end
A Theory of Visibility Measures in the Dissociation Paradigm
Research on perception without awareness primarily relies on the dissociation
paradigm, which compares a measure of awareness of a critical stimulus (direct
measures) with a measure indicating that the stimulus has been processed at all
(indirect measure). We argue that dissociations between direct and indirect
measures can only be demonstrated with respect to the critical stimulus feature
that generates the indirect effect, and the observer's awareness of that
feature, the critical cue. We expand Kahneman's (1968) concept of criterion
content to comprise the set of all cues than an observer actually uses to
perform the direct task. Different direct measures can then be compared by
studying the overlap of their criterion contents and their containment of the
critical cue. Because objective and subjective measures may integrate different
sets of cues, one measure generally cannot replace the other without
sacrificing important information. Using a simple mathematical formalization,
we redefine and clarify the concepts of validity, exclusiveness, and
exhaustiveness in the dissociation paradigm, show how dissociations among
different awareness measures falsify simple theories of consciousness, and
formulate the demand that theories of visual awareness should be sufficiently
specific to explain dissociations among different facets of awareness.Comment: v1: initial upload. v2: added arXiv identifier. v3: corrected an
error in mathematical notation in the "definition (iii)" section. v5: adds
reference to the published article. Note that the manuscript responding to
this preprint has now been published in Psychonomic Bulletin & Review and
should be cited preferentiall
Achieving Anonymity via Weak Lower Bound Constraints for k-Median and k-Means
We study k-clustering problems with lower bounds, including k-median and k-means clustering with lower bounds. In addition to the point set P and the number of centers k, a k-clustering problem with (uniform) lower bounds gets a number B. The solution space is restricted to clusterings where every cluster has at least B points. We demonstrate how to approximate k-median with lower bounds via a reduction to facility location with lower bounds, for which O(1)-approximation algorithms are known.
Then we propose a new constrained clustering problem with lower bounds where we allow points to be assigned multiple times (to different centers). This means that for every point, the clustering specifies a set of centers to which it is assigned. We call this clustering with weak lower bounds. We give an 8-approximation for k-median clustering with weak lower bounds and an O(1)-approximation for k-means with weak lower bounds.
We conclude by showing that at a constant increase in the approximation factor, we can restrict the number of assignments of every point to 2 (or, if we allow fractional assignments, to 1+?). This also leads to the first bicritera approximation algorithm for k-means with (standard) lower bounds where bicriteria is interpreted in the sense that the lower bounds are violated by a constant factor.
All algorithms in this paper run in time that is polynomial in n and k (and d for the Euclidean variants considered)
Data Mining und wissenschaftliche Forschung – de lege lata und de lege ferenda
Der Vorentwurf zum neuen Urheberrechtsgesetz enthält mit Art. 24d E-URG eine Bestimmung, welche die Verwendung von Werken zu wissenschaftlichen Zwecken regelt. So soll die Vervielfältigung und Bearbeitung von urheberrechtlich geschützten Werken künftig zulässig sein, sofern dies durch die Anwendung eines technischen Verfahrens bedingt ist. Die neue Bestimmung zielt auf Fälle des so genannten Text and Data Mining ab, d.h. der computergestützten Suche, Analyse und Vernetzung von Daten mit dem Ziel, neue Erkenntnisse und Zusammenhänge zu erhalten. Es stellt sich die Frage, ob die in Art. 24d E-URG genannten Werknutzungen nicht bereits nach geltendem Recht zulässig sind. Der vorliegende Beitrag beschreibt zunächst den Vorgang des Text und Data Mining (I.) und dessen urheberrechtliche Relevanz (II.), bevor die möglicherweise einschlägigen Schranken des Urheberrechts untersucht werden (III.). Schliesslich wird auf die neue Bestimmung Art. 24d E-URG eingegangen (IV.)
- …